Import graphlab


In [1]:
import graphlab


[WARNING] sklearn version 0.14.1 is not supported. Minimum required version: 0.15. sklearn support will be disabled.

Load tabular data


In [3]:
sf = graphlab.SFrame('people-example.csv')


[INFO] This non-commercial license of GraphLab Create is assigned to akshay.narayan@u.nus.eduand will expire on September 26, 2016. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-17038 - Server binary: /usr/local/lib/python2.7/dist-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1443340550.log
[INFO] GraphLab Server Version: 1.6.1
PROGRESS: Finished parsing file /home/anarayan/Workspace/pyDataAnalysis/sframes/people-example.csv
PROGRESS: Parsing completed. Parsed 7 lines in 0.008711 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str,str,str,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file /home/anarayan/Workspace/pyDataAnalysis/sframes/people-example.csv
PROGRESS: Parsing completed. Parsed 7 lines in 0.009636 secs.

In [5]:
sf.head()# view few lines of the table


Out[5]:
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

GraphLab Canvas - for visualization


In [6]:
sf.show()


Canvas is accessible via web browser at the URL: http://localhost:47843/index.html
Opening Canvas in default web browser.

In [7]:
graphlab.canvas.set_target('ipynb')

In [8]:
sf['age'].show(view='Categorical')


Inspecting some columns of the dataset


In [9]:
sf['Country']


Out[9]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

In [10]:
sf['age']


Out[10]:
dtype: int
Rows: 7
[24, 23, 22, 23, 23, 22, 25]

In [11]:
sf['age'].mean()


Out[11]:
23.142857142857146

In [12]:
sf['age'].max()


Out[12]:
25

Create new columns in SFrame


In [13]:
sf


Out[13]:
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

In [14]:
sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']

In [15]:
sf


Out[15]:
First Name Last Name Country age Full Name
Bob Smith United States 24 Bob Smith
Alice Williams Canada 23 Alice Williams
Malcolm Jone England 22 Malcolm Jone
Felix Brown USA 23 Felix Brown
Alex Cooper Poland 23 Alex Cooper
Tod Campbell United States 22 Tod Campbell
Derek Ward Switzerland 25 Derek Ward
[7 rows x 5 columns]

Advanced transformation : .apply() function


In [16]:
sf['Country']


Out[16]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

In [18]:
sf['Country'].show()



In [19]:
def transformCountry(country):
    if country == 'USA':
        return "United States"
    else:
        return country

In [21]:
transformCountry('USA')


Out[21]:
'United States'

In [23]:
sf['Country'].apply(transformCountry)


Out[23]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'United States', 'Poland', 'United States', 'Switzerland']

In [24]:
sf['Country']


Out[24]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

In [25]:
sf['Country'] = sf['Country'].apply(transformCountry)

In [26]:
sf


Out[26]:
First Name Last Name Country age Full Name
Bob Smith United States 24 Bob Smith
Alice Williams Canada 23 Alice Williams
Malcolm Jone England 22 Malcolm Jone
Felix Brown United States 23 Felix Brown
Alex Cooper Poland 23 Alex Cooper
Tod Campbell United States 22 Tod Campbell
Derek Ward Switzerland 25 Derek Ward
[7 rows x 5 columns]

In [ ]: